Boosting Precision and Recall of Hyponymy Relation Acquisition from Hierarchical Layouts in Wikipedia
نویسندگان
چکیده
This paper proposes an extension of Sumida and Torisawa’s method of acquiring hyponymy relations from hierachical layouts in Wikipedia (Sumida and Torisawa, 2008). We extract hyponymy relation candidates (HRCs) from the hierachical layouts in Wikipedia by regarding all subordinate items of an item x in the hierachical layouts as x’s hyponym candidates, while Sumida and Torisawa (2008) extracted only direct subordinate items of an item x as x’s hyponym candidates. We then select plausible hyponymy relations from the acquired HRCs by running a filter based on machine learning with novel features, which even improve the precision of the resulting hyponymy relations. Experimental results show that we acquired more than 1.34 million hyponymy relations with a precision of 90.1%.
منابع مشابه
Hacking Wikipedia for Hyponymy Relation Acquisition
This paper describes a method for extracting a large set of hyponymy relations from Wikipedia. The Wikipedia is much more consistently structured than generic HTML documents, and we can extract a large number of hyponymy relations with simple methods. In this work, we managed to extract more than 1.4 × 106 hyponymy relations with 75.3% precision from the Japanese version of the Wikipedia. To th...
متن کاملPattern-Based Ontology Construction from Selected Wikipedia Pages
In this paper, we describe how ontologies can be built automatically from definitions obtained by searching Wikipedia for lexico-syntactic patterns based on the hyponymy relation. First, we describe how definitions are retrieved and processed while taking into account both recall and precision. Further, concentrating only on precision, we show how a consistent and useful domain ontology can be ...
متن کاملCo-STAR: A Co-training Style Algorithm for Hyponymy Relation Acquisition from Structured and Unstructured Text
This paper proposes a co-training style algorithm called Co-STAR that acquires hyponymy relations simultaneously from structured and unstructured text. In CoSTAR, two independent processes for hyponymy relation acquisition – one handling structured text and the other handling unstructured text – collaborate by repeatedly exchanging the knowledge they acquired about hyponymy relations. Unlike co...
متن کاملCoarse to Fine: Diffusing Categories in Wikipedia
Automatic taxonomy construction aims to build a categorization system without human efforts. Traditional textual pattern based methods extract hyponymy relation in raw texts. However, these methods usually yield low precision and recall. In this paper, we propose a method to automatically find diffusing attributes to a category from Wikipedia infoboxes. We use the diffusing attribute to diffuse...
متن کاملHyponym Extraction from the Web based on Property Inheritance of Text and Image Features
Concept hierarchy knowledge, such as hyponymy and meronymy, is very important for various Natural Language Processing systems. While WordNet and Wikipedia are being manually constructed and maintained as lexical ontologies, many researchers have tackled how to extract concept hierarchies from very large corpora of text documents such as the Web not manually but automatically. However, their met...
متن کامل